Language Identification using Classifier Ensembles
نویسندگان
چکیده
In this paper we describe the language identification system we developed for the Discriminating Similar Languages (DSL) 2015 shared task. We constructed a classifier ensemble composed of several Support Vector Machine (SVM) base classifiers, each trained on a single feature type. Our feature types include character 1–6 grams and word unigrams and bigrams. Using this system we were able to outperform the other entries in the closed training track of the DSL 2015 shared task, achieving the best accuracy of 95.54%.
منابع مشابه
Classifier ensembles for image identification using multi-objective Pareto features
In this paper we propose classifier ensembles that use multiple Pareto image features for invariant image identification. Different from traditional ensembles that focus on enhancing diversity by generating diverse base classifiers, the proposed method takes advantage of the diversity inherent in the Pareto features extracted using a multi-objective evolutionary Trace Transform algorithm. Two v...
متن کاملLTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles
We present the description of the LTG entry in the SemEval-2016 Complex Word Identification (CWI) task, which aimed to develop systems for identifying complex words in English sentences. Our entry focused on the use of contextual language model features and the application of ensemble classification methods. Both of our systems achieved good performance, ranking in 2nd and 3rd place overall in ...
متن کاملOPT: Oslo-Potsdam-Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing
The OPT submission to the Shared Task of the 2016 Conference on Natural Language Learning (CoNLL) implements a ‘classic’ pipeline architecture, combining binary classification of (candidate) explicit connectives, heuristic rules for non-explicit discourse relations, ranking and ‘editing’ of syntactic constituents for argument identification, and an ensemble of classifiers to assign discourse se...
متن کاملMultilingual native language identification
We present the first study of Native Language Identification (NLI) applied to text written in languages other than English, using data from six languages. NLI is the task of predicting an author’s first language (L1) using only their writings in a second language (L2), with applications in Second Language Acquisition and forensic linguistics. Most research to date has focused on English but the...
متن کاملA genetic approach for training diverse classifier ensembles
Classification is an active topic of Machine Learning. The most recent achievements in this domain suggest using ensembles of learners instead of a single classifier to improve classification accuracy. Comparisons between Bagging and Boosting show that classifier ensembles perform better when their members exhibit diversity, that is commit different errors. This paper proposes a genetic algorit...
متن کامل